Search CORE

46 research outputs found

A scalable machine-learning approach to recognize chemical names within large text databases

Author: A Zamora
CH Davis
E Charniak
G Nenadic
I Donaldson
J Finkel
JD Wren
JD Wren
JD Wren
JD Wren
Jonathan D Wren
L Hirschman
LR Rabiner
M Krauthammer
M Narayanaswamy
MA Drake
MD Yandell
PAV Hall
S Albert
S Raychaudhuri
U Leser
WJ Wilbur
WR Pearson
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

MOTIVATION: The use or study of chemical compounds permeates almost every scientific field and in each of them, the amount of textual information is growing rapidly. There is a need to accurately identify chemical names within text for a number of informatics efforts such as database curation, report summarization, tagging of named entities and keywords, or the development/curation of reference databases. RESULTS: A first-order Markov Model (MM) was evaluated for its ability to distinguish chemical names from words, yielding ~93% recall in recognizing chemical terms and ~99% precision in rejecting non-chemical terms on smaller test sets. However, because total false-positive events increase with the number of words analyzed, the scalability of name recognition was measured by processing 13.1 million MEDLINE records. The method yielded precision ranges from 54.7% to 100%, depending upon the cutoff score used, averaging 82.7% for approximately 1.05 million putative chemical terms extracted. Extracted chemical terms were analyzed to estimate the number of spelling variants per term, which correlated with the total number of times the chemical name appeared in MEDLINE. This variability in term construction was found to affect both information retrieval and term mapping when using PubMed and Ovid

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Some methods for blindfolded record linkage

BACKGROUND: The linkage of records which refer to the same entity in separate data collections is a common requirement in public health and biomedical research. Traditionally, record linkage techniques have required that all the identifying data in which links are sought be revealed to at least one party, often a third party. This necessarily invades personal privacy and requires complete trust in the intentions of that party and their ability to maintain security and confidentiality. Dusserre, Quantin, Bouzelat and colleagues have demonstrated that it is possible to use secure one-way hash transformations to carry out follow-up epidemiological studies without any party having to reveal identifying information about any of the subjects – a technique which we refer to as "blindfolded record linkage". A limitation of their method is that only exact comparisons of values are possible, although phonetic encoding of names and other strings can be used to allow for some types of typographical variation and data errors. METHODS: A method is described which permits the calculation of a general similarity measure, the n-gram score, without having to reveal the data being compared, albeit at some cost in computation and data communication. This method can be combined with public key cryptography and automatic estimation of linkage model parameters to create an overall system for blindfolded record linkage. RESULTS: The system described offers good protection against misdeeds or security failures by any one party, but remains vulnerable to collusion between or simultaneous compromise of two or more parties involved in the linkage operation. In order to reduce the likelihood of this, the use of last-minute allocation of tasks to substitutable servers is proposed. Proof-of-concept computer programmes written in the Python programming language are provided to illustrate the similarity comparison protocol. CONCLUSION: Although the protocols described in this paper are not unconditionally secure, they do suggest the feasibility, with the aid of modern cryptographic techniques and high speed communication networks, of a general purpose probabilistic record linkage system which permits record linkage studies to be carried out with negligible risk of invasion of personal privacy

CiteSeerX

Crossref

Springer - Publisher Connector

PubMed Central

The Australian National University

Testing a global standard for quantifying species recovery and assessing conservation impact

Author: Acebes P
Akçakaya HR
Alfaro-Shigueto J
Alvarez-Clare S
Andriantsimanarilafy RR
Arbetman M
Azat C
Bacchetta G
Badola R
Barcelos LMD
Barreiros JP
Basak S
Bennett EL
Berger DJ
Bhattacharyya S
Bino G
Borges PAV
Boughton RK
Brockmann HJ
Brooks TM
Buckley HL
Burfield IJ
Burton J
Camacho-Badani T
Cano-Alonso LS
Carmichael RH
Carrero C
Carroll JP
Catsadorakis G
Chapple DG
Chapron G
Chowdhury GW
Claassens L
Cogoni D
Constantine R
Craig CA
Cunningham AA
Da Rosa P
Dahal N
Daltry JC
Das GC
Dasgupta N
Davey A
Davies K
De Lima TM
Develey P
Di Febbraro M
Dos Santos AS
Elangovan V
Fairclough D
Fenu G
Fernandes FM
Fernandez EP
Finucci B
Foley CM
Ford M
Forstner MRJ
Földesi R
Garcia-Sandoval R
García N
Gardner PC
Garibay-Orijel R
Gatan-Balbas M
Gauto I
Ghazi MGU
Godfrey SS
Gollock M
González BA
Grace MK
Grant TD
Gray T
Gregory AJ
Gryzenhout M
Guernsey NC
Gupta G
Hagen C
Hagen CA
Hall MB
Hallerman E
Hare K
Hart T
Hartdegen R
Harvey-Brown Y
Hatfield R
Hawke T
Heath A
Hedges S
Hermes C
Hilton-Taylor C
Hitchmough R
Hochkirch A
Hoffmann M
Hoffmann PM
Howarth C
Hudson MA
Hussain SA
Huveneers C
Jacques H
Jenkins R
Jorgensen D
Katdare S
Katsis LKD
Kaul R
Kaunda-Arara B
Keith DA
Keith-Diagne L
Kraus DT
Lindeman K
Linsky J
Long B
Louis Jr E
Loy A
Lughadha EN
Mallon DP
Mangel JC
Marinari PE
Martin GM
Martinelli G
McGowan PJK
McInnes A
Meijaard E
Mendes ETB
Millard MJ
Milner-Gulland EJ
Mirande C
Money D
Monks JM
Morales CL
Mumu NN
Negrao R
Nguyen AH
Niloy MNH
Norbury GL
Nordmeyer C
Norris D
O'Brien M
Oda GA
Orsenigo S
Outerbridge ME
Pasachnik S
Pike C
Pilkington F
Plumb G
Portela RDCQ
Prohaska A
Pérez-Jiménez JC
Quintana MG
Rakotondrasoa EF
Ranglack DH
Rankou H
Rawat AP
Reardon JT
Rheingantz ML
Richter SC
Rivers MC
Rodriguez JP
Rogers LR
Rose P
Royer E
Ryan C
Sadovy de Mitcheson YJ
Salmon L
Salvador CH
Samways MJ
Sanjuan T
Sasaki H
Schutz E
Scott HA
Scott RM
Serena F
Sharma SP
Shuey JA
Silva CJP
Simaika JP
Smith DR
Spaet JLY
Stephenson PJ
Stuart SN
Sultana S
Talukdar BK
Tatayah V
Thomas P
Tringali A
Trinh-Dinh H
Tuboi C
Usmani AA
Van Grunsven RHA
Van Weerd M
Vasco-Palacios AM
Virens J
Vié J-C
Walker A
Wallace B
Waller LJ
Wang H
Wearn OR
Weigmann S
Willcox D
Woinarski J
Yong JWH
Young RP
Young S
Publication venue: Wiley-Blackwell
Publication date: 01/01/2021
Field of study

Recognizing the imperative to evaluate species recovery and conservation impact, in 2012 the International Union for Conservation of Nature (IUCN) called for development of a “Green List of Species” (now the IUCN Green Status of Species). A draft Green Status framework for assessing species’ progress toward recovery, published in 2018, proposed 2 separate but interlinked components: a standardized method (i.e., measurement against benchmarks of species’ viability, functionality, and preimpact distribution) to determine current species recovery status (herein species recovery score) and application of that method to estimate past and potential future impacts of conservation based on 4 metrics (conservation legacy, conservation dependence, conservation gain, and recovery potential). We tested the framework with 181 species representing diverse taxa, life histories, biomes, and IUCN Red List categories (extinction risk). Based on the observed distribution of species’ recovery scores, we propose the following species recovery categories: fully recovered, slightly depleted, moderately depleted, largely depleted, critically depleted, extinct in the wild, and indeterminate. Fifty-nine percent of tested species were considered largely or critically depleted. Although there was a negative relationship between extinction risk and species recovery score, variation was considerable. Some species in lower risk categories were assessed as farther from recovery than those at higher risk. This emphasizes that species recovery is conceptually different from extinction risk and reinforces the utility of the IUCN Green Status of Species to more fully understand species conservation status. Although extinction risk did not predict conservation legacy, conservation dependence, or conservation gain, it was positively correlated with recovery potential. Only 1.7% of tested species were categorized as zero across all 4 of these conservation impact metrics, indicating that conservation has, or will, play a role in improving or maintaining species status for the vast majority of these species. Based on our results, we devised an updated assessment framework that introduces the option of using a dynamic baseline to assess future impacts of conservation over the short term to avoid misleading results which were generated in a small number of cases, and redefines short term as 10 years to better align with conservation planning. These changes are reflected in the IUCN Green Status of Species Standard

UCL Discovery

Curating clinical science: what is the future of academic publishing?

Author: Hall PAV
Publication venue: 'Wiley'
Publication date
Field of study

Crossref

Relational Architecture

Author: CW Bachman
DD Chamberlin
EF Codd
EF Codd
M Stonebraker
PAV Hall
PAV Hall
W Kent
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/1988
Field of study